Mining Informative Hydrologic Data by Using Support Vector Machines and Elucidating Mined Data according to Information Entropy

نویسنده

  • Shien-Tsung Chen
چکیده

The support vector machine is used as a data mining technique to extract informative hydrologic data on the basis of a strong relationship between error tolerance and the number of support vectors. Hydrologic data of flash flood events in the Lan-Yang River basin in Taiwan were used for the case study. Various percentages (from 50% to 10%) of hydrologic data, including those for flood stage and rainfall data, were mined and used as informative data to characterize a flood hydrograph. Information on these mined hydrologic data sets was quantified using entropy indices, namely marginal entropy, joint entropy, transinformation, and conditional entropy. Analytical results obtained using the entropy indices proved that the mined informative data could be hydrologically interpreted and have a meaningful explanation based on information entropy. Estimates of marginal and joint entropies showed that, in view of flood forecasting, the flood stage was a more informative variable than rainfall. In addition, hydrologic models with variables containing more total information were preferable to variables containing less total information. Analysis results of transinformation explained that approximately 30% of information on the flood stage could be derived from the upstream flood stage and 10% to 20% from the rainfall. Elucidating the mined hydrologic data by applying information theory enabled using the entropy indices to interpret various hydrologic processes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

کاربرد الگوریتم‌های داده‌کاوی در تفکیک منابع رسوبی حوزۀ آبخیز نوده گناباد

Introduction: Reduction of sediment supply requires the implementation of soil conservation and sediment control programs in the form of watershed management plans. Sediment control programs require identifying the relative importance of sediment sources, their quantitative ascription and identification of critical areas within the watersheds. The sediment source ascription is involves two...

متن کامل

Mining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM

Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...

متن کامل

Remote Sensing and Land Use Extraction for Kernel Functions Analysis by Support Vector Machines with ASTER Multispectral Imagery

Land use is being considered as an element in determining land change studies, environmental planning and natural resource applications. The Earth’s surface Study by remote sensing has many benefits such as, continuous acquisition of data, broad regional coverage, cost effective data, map accurate data, and large archives of historical data. To study land use / cover, remote sensing as an effic...

متن کامل

Application of Artificial Neural Networks and Support Vector Machines for carbonate pores size estimation from 3D seismic data

This paper proposes a method for the prediction of pore size values in hydrocarbon reservoirs using 3D seismic data. To this end, an actual carbonate oil field in the south-western part ofIranwas selected. Taking real geological conditions into account, different models of reservoir were constructed for a range of viable pore size values.  Seismic surveying was performed next on these models. F...

متن کامل

Intelligent and Online Evaluation of Diabetes using Wireless Sensor Networks and Support Vector Machines Algorithm

Objective: International Diabetes Organization estimates that there are 285 million people worldwide who suffer from diabetes, and this figure is expected to increase to 450 million in next 20 years. According to statistics issued by the World Health Organization, diabetes is considered among ten leading causes of death in world and its prevalence in the population is growing.This paper deals w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Entropy

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2015